Introduction

This document will describe how to use AvenioUpdate
to update and explore data generated with the Avenio pipeline at AUH.
The package is dependent on two files . The first file is an R file
(.rds) which contains all the Avenio results for each
sample. This file can be loaded into R where the data is visible. The
second file is an excel file (.xlsx) which contains all the
basic information about the samples run on the NGS machine. Both files
are located at //Synology_m1/Synology_folder/AVENIO/
and will be explained in much greater detail below.
As of late 2025 the package contains four main
functions (add_run_to_list(),
add_new_key(),extract_project() &
create_simple_output()) which are used to update the
results file and get an overview of the results for a single patient. In
addition, the package contains four smaller functions
which are used to get some statistics on the data we have collected and
to make sure the results file is updated correctly.
Installation
This package is not a real package given it is not published
at places like CRAN or Bioconductor. However, it can still be installed
via my github
page as demonstrated below:
# Only needs to be run once
if (!require(devtools)) install.packages('devtools')
library(devtools)
# Only needs to be run once (Or when the package is updated)
devtools::install_github("CTrierMaansson/AvenioUpdate")
# Needs to be run every time you open RStudio
library(AvenioUpdate)
To check if the package has been installed correctly run the
following command:
Which should give an output like this:
#> Name Description
#> 1 Basestats Number of samples, runs, patients, different materials etc.
#> 2 Projectstats Number of patients and samples in each project
#> 3 Missing Samples present in AVENIO_runs.xlsx but not present in the results data set
#> 4 All_mutations Number of times each gene is mutated across all samples
#> 5 Relevant_SNV Number of SNVs detected not classified as BC or synonymous mutations
#> 6 Relevant_INDEL Number of INDELs detected not classified as BC or synonymous mutations
#> 7 BC_in_plasma Number of times each gene is mutated in plasma but classified as BC mutation
#> 8 Fusions_project Number of patients for each project with detectable EML4-ALK with DNAfusion
#> 9 Fusions_sample Number of samples for each project with detectable EML4-ALK with DNAfusion
#> 10 Fusions_variant Number of patients with each EML4-ALK variant
#> 11 Fusions_NC Information on the non classified EML4-ALK variants
#> 12 Lengths Distribution of fragmentlengths across different materials
#> 13 Depths Distribution of unique depths across different materials
#> 14 Reads Distribution of mapped reads across different materials
#> 15 On_target Distribution of on target percents across different materials
Sometimes installing devtools does not
configure git correctly with R and you therefore have to
install git manually.
If you use Windows install git via https://git-scm.com/downloads/win.
Use the standalone installer 64-bit version. After you have installed
git test if git has been installed correctly by opening the
terminal. Press the windows bottom on your keyboard and
type cmd and press enter. In the terminal type “git” and
press enter. If git has been installed correctly you should see “usage:
git [-v | –version] [-h | –help] …”. If you do not see this message.
restart your computer and try again.
If you use Mac install git via https://git-scm.com/download/mac
and use homebrew for the installation. Test if
git has been installed correctly like explained above but
open the terminal by pressing cmd + space and
type Terminal.
Version
This manual was created using the following version of
AvenioUpdate:
packageVersion("AvenioUpdate")
#> [1] '1.13.0'
Try and run packageVersion("AvenioUpdate"). If you do
not get the same result then run the installation command:
devtools::install_github("CTrierMaansson/AvenioUpdate")
How it works
As mentioned, the package is dependent on two files located on the Synology. This means in
order to use the package you must have access to the Synology folder.
How to obtain this, is found in /OneDrive/Lung
cancer group ALABS/How-to guides . Once you
have gained access to the Synology you must stay connected using the
Ethernet cable connection IPv4-adress: 10.124.6.78
(health.client.au) or IPv4-adress: 10.60.24.79 (onerm).
Notice how both AU and region networks can establish
the connection to synology.
The AVENIO_runs.xlsx
file
This file is at //Synology_m1/Synology_folder/AVENIO/
and contains all the relevant information for all NGS samples across
different projects. The file is used to connect CPR numbers and other
relevant information with the NGS results from the Avenio output. By
updating and using this file it serves as a key-file allowing
type errors, poorly designed sample names, non-unique sample names
across different runs, etc. in the Avenio system and maintain a
consistency across all our projects. Yes! Very neat!
However…
THIS FILE IS UPDATED MANUALLY
The file can be opened in excel where the samples can be added
accordingly. It is VERY important that samples/entries
are not deleted from the file. Christoffer makes sure there is a backup
of the file on GenomeDK, which is updated regularly. If you make errors
when entering your information, no worries, this can be fixed without
any repercussions. However, it is very difficult to obtain deleted
data.
So as long as you don’t delete anything or change entries not
belonging to you all is fine!
Nevertheless, there are some rules regarding the
file:
- Do not delete anything you are not supposed to
- Do not delete anything you are not supposed to
- Never move the file from its directory //Synology_m1/Synology_folder/AVENIO/
- Do not make your own copy of the file and store it
locally
- Format the information in the correct format (see below)
- Try to avoid adding samples with incomplete
information
- Avoid the use of special characters like ”ÆØÅ½#$”
etc. but “_” is okay, EXCEPT IN PROJECT AND
NAME_IN_PROJECT!
- And do not delete anything you are not supposed
to
Variables in
AVENIO_runs.xlsx
This table briefly shows the general requirements for all variables
in AVENIO_runs.xlsx. For details please see the descriptions below.
For a detailed description of how these variables are combined and
used context to different types of samples and analyses, see the section
on sample_index
|
Variable
|
Included in sample_index
|
Uniqueness
|
Formatting requirements
|
add_new_key() required
|
Is required
|
Can contain ’_’
|
|
CPR
|
No
|
Within project
|
Patient specific
|
No
|
Yes
|
Yes
|
|
Name_in_project
|
Yes
|
Within project
|
Project specific
|
No
|
Yes
|
No
|
|
Project
|
Yes
|
None
|
Project specific
|
Yes
|
Yes
|
No
|
|
Sample_number
|
No
|
None
|
None
|
No
|
No
|
Yes
|
|
Sample_date
|
Yes
|
None
|
YYYY-MM-DD
|
No
|
Yes
|
No
|
|
Run_name
|
No
|
Run specifc
|
None
|
No
|
No
|
Yes
|
|
Run_ID
|
No
|
Run specific
|
24 characters
|
No
|
Yes
|
Yes
|
|
Sample_name
|
No
|
Within run
|
None
|
No
|
Yes
|
Yes
|
|
Sample_note
|
(Yes)
|
None
|
None
|
Yes
|
Yes
|
Yes
|
|
Material
|
(Yes)
|
None
|
None
|
Yes
|
Yes
|
Yes
|
Sample_note and Material have (Yes) in the “included in sample_index”
column, because it is dependent on what is included in these columns
whether the variable is included in sample_index (see below)
CPR
The CPR number of the patient if such number is known/exists. This
number is unique to every patient and is used to explore the results.
The number is used to extract the information from each patient in
create_simple_output().
As explained below, AVENIO_results_patients.rds is a
named list of data.frames where each
data.frame is named with the CPR number of the patient.
Therefore you will able to explore all Avenio information, including
sample metrics using:
# Reading the results file
results <- readRDS("//Synology_m1/Synology_folder/AVENIO/AVENIO_results_patients.rds")
# Extracting the data for a specific patient using the CPR number
results$`<CPR_number>`
OR
# Reading the results file
results <- readRDS("//Synology_m1/Synology_folder/AVENIO/AVENIO_results_patients.rds")
# Extracting the data for a specific patient using the CPR number
results[[`<CPR_number>`]]
Name_in_project
This is the name given to a patient in a project. This name
must be unique for the patients within a specific
project, but the same name can be used across different projects such as
pt26, 26, patient26 etc. Because of
how the package works ”_” cannot be included in
Name_in_project
It is very important that for a given project a
single CPR can only be assigned to a single Name_in_project. The same
CPR can be included in different projects with different
Name_in_project.
If any entry in the file contains “_” in Name_in_project or multiple
Name_in_project are assigned to the same CPR number, it will result in
an ERROR
Project
This is the project the patient is part of. This is used to group
patients as unique entries. It can also be used to quickly collect the
patient information for your specific project by filtering on the
project name. Because of how the package works ”_”
cannot be included in Project
If any entry in the file contains “_” in Project, it will result in
an ERROR
To include a new project run: add_new_key() and to see current
included projects run:
readRDS("//Synology_m1/Synology_folder/AVENIO/AVENIO_keys.rds")
Sample_date
This is the date of the sample collection. It is important to
format the date correctly. The date should be
formatted as YYYY-MM-DD. It happens often that the sample
dates for buffy coat (BC) and baseline (BL) samples are identical. This
is not an issue as long as the Material column is also
filled out correctly.
When the data is collected and analyzed the algorithm checks the
format of the date and if it is not correct it will result in an
ERROR
Sample_number
This is number corresponding to the blood sample. This is not used by
the code but it can be useful to connect sample dates and sample
numbers. If the number is unknown you can just write “Unknown”.
Run_name
This is the name of the run as it appears in the Avenio system. While
this name is not used specifically in the algorithm it is good to have
the name in the file to have a record of which Run_name matches which
Run_ID.
Run_ID
This is the name of the Run_ID which is randomly generated by the
Avenio system. This is an important ID because it is the name of the
folder where the results from Avenio are located on the Synology server.
The Run_ID is always 24 characters long and starts with an “A”. If the
algorithm detects a Run_ID that is not 24 characters long it will result
in an ERROR
Sample_name
This is the name of the sample as it appears in the Avenio system.
This name is only unique to the specific run and is used in the
algorithm to extract the BAM files. The algorithm tests if the specific
run contains a folder with the specific Sample_name and if it cannot
find a folder with the specific Sample_name it will result in an
ERROR.
Sample_note
This is a note that can be added to the sample and it is not used in
the algorithm. However, it is nice to include to get a quick overview of
the samples that have been analyzed for the specific patient. In general
the variable refers to time of blood sampling in relation to treatment
initiation or if the sample is a BC sample. There is no limitation on
what can be included in this column although new possible entries has to
be added with add_new_key()
To see current included Sample_notes run:
readRDS("//Synology_m1/Synology_folder/AVENIO/AVENIO_keys.rds")
Material
This is a mandatory column.
The variable refers to the source of the material analyzed. In most
cases this is “cfDNA” or “BC” which reflects purified plasma cfDNA or
DNA from PBMCs, respectively. Other sources include “size_selection” or
“cfChIP”. This is a mandatory column because it allows the same blood
sample (same Project, Name_in_project, Sample_date) to be analyzed
multiple times. If Material is not filled out the algorithm will result
in an ERROR.
The classification of “BC” samples is important because
non-BC samples from a patient have variants “flagged” according to the
variants detected in the BC sample.
reanalyze has been added as a type of Material. This is used
when a sample has been analyzed more than once. This can be used to
discriminate between the two analyses which would otherwise have the
same Project, Name_in_project, and Sample_date.
If the NGS has been executed using the tissue version of AVENIO put
in tissue in Material. This tells AvenioUpdate that the
output format of the results .csv files are different from the plasma
version of AVENIO. If it is a BC sample that has been analyzed with the
tissue version of AVENIO put in “BC” in Sample_note and
“tissue” in Material. If it is a tumor sample that has
been analyzed with the tissue AVENIO protocol use “tumor_BL” or
“tumor_Tx” in Sample_note.
There is no limitation on what can be included in this column
although new possible entries has to be added with add_new_key()
To see current included Materials run:
readRDS("//Synology_m1/Synology_folder/AVENIO/AVENIO_keys.rds")
The
AVENIO_results_patients.rds file
This is the other important file
located at //Synology_m1/Synology_folder/AVENIO/,
however this file is NOT updated manually by opening
and editing it. Instead, this file is updated automatically by the
algorithm when add_run_to_list() is executed. This means,
when you have updated the AVENIO_runs.xlsx file with new samples and you
want to update the results, you run the following:
# Reading the results file
results <- readRDS("//Synology_m1/Synology_folder/AVENIO/AVENIO_results_patients.rds")
# Example path to the specific run
test_path <- "//Synology_m1/Synology_folder/AVENIO/AVENIO_results/Plasma-ANei_Hsi__lGmaJ1k_Ji_d3O"
# Updating the results file
results <- add_run_to_list(master_list = results,
Directory = test_path)
Then results will contain all of the information from
the AVENIO_runs.xlsx file and AVENIO_results_patients.rds has been
updated accordingly.
More technical note
on AVENIO_results_patients.rds (Not mandatory
read)
AVENIO_results_patients.rds is an .rds file which means
it is a saved R object. The file therefore cannot be explored outside of
R. However, the file can be loaded into R where the data is visible. The
file is a named list of data.frames where each
data.frame is named after the CPR number of the
patient.
All the information from the Avenio system is stored in the
data.frames which is extracted from the filtered variants
located in the .csv files. If no variants have been
detected for the sample, all of the other sample metrics such as the
median depth, fragment length, base quality, etc. are still stored in
the the data.frames. This is also to register that the
sample has been analyzed and if other samples with detectable variants
have been analyzed for that patient these variants are being
investigated in the original BAM files for the patient.
sample_index
The variable sample_index is created for all samples
when add_run_to_list() is executed. The essential idea
behind this variable is to combine the most basic information about an
analyzed sample and create a unique and consistent name for that sample.
The sample_index naming is inspired by the naming strategy
created by Lærke, which is explained in more detail on OneDrive at /OneDrive/Lung
cancer group ALABS/How-to guides/How to navngiv
patientprøver.pptx
The sample_index is created using the AVENIO_runs.xlsx
file using the following variables:
- Project
- Name_in_project
- Sample_date
- Material
- Sample_note
And with some code which is run internally in
AvenioUpdate:::create_sample_index(), the following
decision tree is used to determine how the sample_index is
formatted and contains the relevant information.

The figure shows how AvenioUpdate runs through the variables for each
sample and based on the inputs it combines the variables in different
formats (grey boxes). The reason behind the decision tree:
- Able to handle samples sequenced with plasma or tissue Avenio
protocols
- Able to identify BC samples
- Able to combine results of two analyses of the same sample aka.
reanalyze
- Able to add information regard processing of the sample,
e.g. size_selection
In the bottom of the figure I have given an example of how the
sample_index can look for a patient in the FIOL cohort
where multiple samples have been analyzed for that patient. Samples A to
G are further explained in the figure below and shows what inputs are
used in the AVENIO_runs.xlsx file.

sample_index is exported as a variable in
create_simple_output() as explained below to give the best
sample context.
Synology path
The pacakge is dependent on access to the synology folder which can
be established through the how-to-guide: /OneDrive/Lung
cancer group ALABS/How-to guides . At times, this connection is
established though different mechanisms and this can affect the path to
the synology folder. The correct (meaning default) path
is //Synology_m1/Synology_folder/ and has been hard coded
into all the functions. This is also the path which is used in this
manual.
However, if the connection to the synology folder only can get
established through the IP adress. Then the path could be:
//10.124.39.251/Synology_folder/, or any other path has
been created between your computer and the Synology (See above). Then use the synology_path
argument to specify the exact connection which you have established to
the synology. If you do not have the default path you should specify the
synology path every time you run functions similar to the example
below:
# defining path
syn_path <- "//10.124.39.251/Synology_folder/AVENIO/"
#Example of function needing the synology_path argument
add_run_to_list(...,
synology_path = syn_path)
# '...' means "additional arguments"
And the results file can only be read using:
#Reading the Avenio results:
results <- readRDS("//10.124.39.251/Synology_folder/AVENIO/AVENIO_results_patients.rds")
Main functions
add_run_to_list()
As mentioned, this is the most important function in the package. The
function is used to update the AVENIO_results_patients.rds file
with any new information from the AVENIO_runs.xlsx file.
How it works
How the function works is illustrated below:
The function takes two mandatory arguments:
master_list which is the AVENIO_results_patients.rds
loaded in using readRDS()
Directory which is the complete path
to the specific run on the Synology server
And one optional argument:
synology_path which is the path to the Avenio_runs.xlsx
and AVENIO_results_patients.rds files. (Default:
“//Synology_m1/Synology_folder/AVENIO/”)
As illustrated in the figure above the function will extract the
relevant information from the AVENIO_runs.xlsx file. Then it will look through
the established results on all the patients (using the CPR numbers) and
collect the new run with previous runs from patients in the new run.
Then it will take all the identified mutations for each patient and look
for that mutation in all BAM files generated for the patient. After
this, if a BC sample exists for the patient, all mutations identified in
the non-BC samples are flagged as BC mutations if they are found the BC
sample as well. Then the results dataset is updated with the new runs
and existing runs are updated after the BAM files have been
reanalyzed.
In addition to this, the newly added samples will also be analyzed
with DNAfusion
which is our most sensitive method for detecting EML4-ALK fusions. If a
fusion is identified we try and classify the fusion variant using the
classification from this
publication
Flags and BC
mutations
The Flags variable is included to give some context
on the mutation. If the mutation is not not found in the Avenio system,
but only found when the BAM file is anlyzed the flag is set to “BAM”.
Then, the MAF and variant depth is
also determined from the BAM file, whereas the standard MAF and variant
depth is determined from the Avenio system.
“DNAfusion” is shown in Flags if the EML4-ALK fusion
is detected by DNAfusion. If the fusion is also detected in
the Avenio system, that fusion is maintained in the output but is not
indicated in Flags
When add_rund_to_list() is executed the internal
function AvenioUpdate:::renanalyze_samples() annotates the
mutations found in non-BC samples as BC mutations if
they are also found in a BC sample. Just a single mutant read in the BC
sample is enough for AvenioUpdate to identify that specific mutation in
the BC sample. Because of this we discriminate between
certain and uncertain BC mutations in
order to filter variants in the non-BC samples based on variyng
confidence of the mutations called in the BC sample:

As shown in the figure Certain BC mutations are
mutations have been identified in the BC sample by AVENIO
OR in the BAM files with at least 3 mutant reads. These
mutations are annotated with ‘BC_mut’ in the Flags
variable.
Uncertain BC mutations are classified as
’uncertain_BC_mut’in the Flags variable. These
mutations have not been identified by AVENIO in the BC sample but only
in the BAM files AND the mutation is identified with
less than 3 mutant reads in the BAM file of the BC sample.
Output
The output of the function is the updated results
master_list (named list of
data.frames). This is automatically saved as the AVENIO_results_patients.rds file and
now
# Reading the results file
results <- readRDS("//Synology_m1/Synology_folder/AVENIO/AVENIO_results_patients.rds")
Means results now contain the Avenio results updated
with the most recent samples.
Tissue vs. plasma
NGS
The output of the function is essentially a collection of the data
presented in the results .csv files created by AVENIO.
HOWEVER, the tissue NGS and plasma NGS does not have
the same variables in the output and we wanted to collect results from
both types of analyses in one dataset.
For example tissue NGS does not have a
Plasma.Volume..mL. variable which is present in the plasma
NGS output, and plasma NGS does not have a Sample.Primer
variable which is used for the tissue NGS, etc.
Because of this, AvenioUpdate has to know if the sample being
analyzed has been created with the tissue or plasma NGS protocol (See
details in Material). This also means that the
final output across all included individuals also contain variables from
both tissue and plasma NGS even though only a few patients have been
investigated with tissue NGS. The reason for this is because I have
to concatenate all samples and this is only possible if all
rows have the same number of variables and variable names
However, the tissue variables for samples analyzed with plasma NGS
just contain NA and can therefore be completely ignored and
vice versa for tissue NGS as demonstrated in the image below:

Tissue NGS samples also have a unique sample_index
structure. So take a look in the sample_index section for more information.
Example
An example of how the function is used is shown below:
# Reading the results file
results <- readRDS("//Synology_m1/Synology_folder/AVENIO/AVENIO_results_patients.rds")
# Example path to the specific run
test_path <- "//Synology_m1/Synology_folder/AVENIO/AVENIO_results/Plasma-ANei_Hsi__lGmaJ1k_Ji_d3O"
# Updating the results file
results <- add_run_to_list(master_list = results,
Directory = test_path)
End messages
Several messages are printed during the execution of the function.
However, for everyday use only the last few messages are of importance.
An example of this could be:
#> 1
#> Before the dataset consisted of 223 individuals and 572 samples analyzed
#> 2
#> Now the dataset consists of 223 individuals and 586 samples analyzed
#> 3
#> The following projects have been updated with this many samples
#> # A tibble: 1 × 2
#> Project n
#> <chr> <dbl>
#> 1 MonAlec 14
#> 4
#> And the following samples have been added to the dataset
#> [1] "MonAlec_83Z7BF2N4Y12L933_200723_BC" "MonAlec_YZ0PPZ1A5JLCB27C_220211_BC"
#> [3] "MonAlec_K6YFO3FWXOFI99UZ_201123_BC" "MonAlec_CQHQ4QA5K295LU8D_190911_BC"
#> [5] "MonAlec_B14Q10RS3BRU807A_210304_BC" "MonAlec_6KYRWVT7W2KW1U48_220208_BC"
#> [7] "MonAlec_PQ9NESTYL82NBF9B_200813_BC" "MonAlec_UE2R8PRMAT4T3IXD_190611_BC"
#> [9] "MonAlec_2UCZEDI4F35T6ILM_200728_BC" "MonAlec_IT8BJVO03C08FV04_191002_BC"
#> [11] "MonAlec_YBNHFHWVB0HBC3YY_200115_BC" "MonAlec_9Y6DEL9FY4T37CX7_201202_BC"
#> [13] "MonAlec_4OOA9FR46IB4BXBJ_191101_BC" "MonAlec_FDEDCZO3IR8TENC0_211119_BC"
#> 5
#> Saving updated list of patients
- The first message explains how many individuals and samples the
dataset consisted of before the addition of the new run
- The second message explains how many individuals and samples the
dataset consists of after the addition of the new run
- The third message explains which projects have been updated and how
many samples have been added to each project
- The fourth message prints the
sample_index (see above)
that have been added to the dataset
- The fifth message explains that the updated list of patients has
been saved
It is good practice to look through these messages after
add_run_to_list() has been executed, to
make sure the data has been updated as expected.
add_new_key()
This is the second main
function of the package and is used to add new possible entries to the
Project, Sample_note, and Material columns in the AVENIO_runs.xlsx file. This step is included to
ensure typos are spotted so you don’t accidentally assign a wrong
e.g. project name to a sample.
The function takes two mandatory arguments
key which is the key name you want to add as possible
entry
variable which is the variable name in the
AVENIO_runs.xlsx you want to add the key to.
And one optional argument:
synology_path which is the path to the Avenio_runs.xlsx
and AVENIO_results_patients.rds files. (Default:
“//Synology_m1/Synology_folder/AVENIO/”)
Current included entries for Project, Sample_note, and Material can
be viewed with:
readRDS("//Synology_m1/Synology_folder/AVENIO/AVENIO_keys.rds")
Output
There is no output of this function, just a message telling you what
variable has been updated with the new key
Example
An example of how the function can be used
add_new_key(key = "test",
variable = "Project")
This should print the following messages
#> Adding the new key: 'test' for the variable: 'Project'.
#> DONE!
create_simple_output()
This is the third main function of the package and
is used to get a quick overview of a specific patient. The function
takes two mandatory arguments:
df_list which is the AVENIO_results_patients.rds loaded
in using readRDS()
CPR_number which is the CPR number of the patient you
want to investigate
And two optional argument:
synonymous which is a logical variable that determines
if synonymous mutations should be included in the output. (Default:
TRUE)
synology_path which is the path to the Avenio_runs.xlsx
and AVENIO_results_patients.rds files. (Default:
“//Synology_m1/Synology_folder/AVENIO/”)
Output
The output is a list with two entries. The first entry
is just the CPR number to understand which patient the information is
gathered from. The second entry is a data.frame with the
following 14 variables where each row is a gene mutation detected in the
patient:
- sample_index - Unique for each sample and is
explained above
- Class - Classification of the mutation
(FUSION,INDEL,SNV)
- Gene - The gene where the mutation is located
- AA - The amino acid change
- Description - Description of the mutation
e.g. missense
- Flags - Flags for the mutation (detailed
below)
- MAF - The mutational allele fraction
- Variant_depth - The number of identified reads with
the variant
- Unique_depth - The number of unique reads on the
position of the variant
- Analysis - Name of the sequencing run in the Avenio
system
- Sample.ID - Name of the sample in the Avenio
system
- Sample.note - Note of the sample in the
AVENIO_runs.xlsx file (BL, Tx, BC, Unknown)
- Material - Material of the sample in the
AVENIO_runs.xlsx file (cfDNA, BC, size_selection, cfChIP)
- Notes - Any notes manually added in the
AVENIO_runs.xlsx file
If no mutations are detected in a sample for a patient the run is
still added but Gene, AA etc. variables just contain NA.
For now which variables are included in the output is set to these
specific variables. However, upon request I will look into ways to
modify the output so the many variables in the Avenio .csv
files also can be extraced in this simple format.
Example
An example of how the function is used is shown below: Again, the
CPR number is not a real CPR number
# Reading the results file
results <- readRDS("//Synology_m1/Synology_folder/AVENIO/AVENIO_results_patients.rds")
# Extracting the simple output
overview <- create_simple_output(df_list = results,
CPR = "1401362941",
synonymous = F)
# CPR number
overview[[1]]
# Data.frame with the 14 variables
overview[[2]]
Which gives the following output:
#> [1] "1401362941"
|
sample_index
|
Class
|
Gene
|
AA
|
Description
|
Flags
|
MAF
|
Variant_depth
|
Unique_depth
|
Analysis
|
Sample.ID
|
Sample_note
|
Material
|
Notes
|
|
NARLAL_60_191113
|
SNV
|
CDH18
|
p.Arg134His
|
Missense
|
BC_mut
|
0.17%
|
17
|
9734
|
20221125
|
60A
|
BL
|
cfDNA
|
NA
|
|
NARLAL_60_191113
|
SNV
|
EGFR
|
p.Arg149Trp;p.Arg149Trp;p.Arg149Trp;p.Arg149Trp
|
Missense;Missense;Missense;Missense
|
BAM, BC_mut
|
0.05%
|
4
|
7820
|
20221125
|
60A
|
BL
|
cfDNA
|
NA
|
|
NARLAL_60_191113
|
SNV
|
ERBB2
|
p.Cys334Ser;p.Cys334Ser;p.Cys319Ser
|
Missense;Missense;Missense
|
|
0.15%
|
13
|
8434
|
20221125
|
60A
|
BL
|
cfDNA
|
NA
|
|
NARLAL_60_191113
|
SNV
|
MET
|
p.Thr1010Ile;p.Thr992Ile
|
Missense;Missense
|
|
0.48%
|
39
|
8186
|
20221125
|
60A
|
BL
|
cfDNA
|
NA
|
|
NARLAL_60_191113
|
SNV
|
PDZRN3
|
p.Leu650Gln;p.Leu367Gln
|
Missense;Missense
|
BAM, BC_mut
|
0.03%
|
3
|
10695
|
20221125
|
60A
|
BL
|
cfDNA
|
NA
|
|
NARLAL_60_191113
|
SNV
|
RET
|
p.Val706Met;p.Val706Met
|
Missense;Missense
|
BAM, BC_mut
|
0.02%
|
3
|
17920
|
20221125
|
60A
|
BL
|
cfDNA
|
NA
|
|
NARLAL_60_191113_BC
|
SNV
|
APC
|
p.Ser837*
|
Stop gained
|
|
1.68%
|
171
|
10203
|
20221208
|
narlal_60_BC
|
BC
|
BC
|
NA
|
|
NARLAL_60_191113_BC
|
SNV
|
BRCA2
|
p.Ile1633Phe
|
Missense
|
|
0.30%
|
26
|
8576
|
20221208
|
narlal_60_BC
|
BC
|
BC
|
NA
|
|
NARLAL_60_191113_BC
|
SNV
|
CDH18
|
p.Arg134His
|
Missense
|
|
0.14%
|
18
|
13199
|
20221208
|
narlal_60_BC
|
BC
|
BC
|
NA
|
|
NARLAL_60_191113_BC
|
SNV
|
DCAF12L2
|
p.Leu288Met
|
Missense
|
|
0.16%
|
16
|
10151
|
20221208
|
narlal_60_BC
|
BC
|
BC
|
NA
|
|
NARLAL_60_191113_BC
|
SNV
|
EGFR
|
p.Arg149Trp;p.Arg149Trp;p.Arg149Trp;p.Arg149Trp
|
Missense;Missense;Missense;Missense
|
|
0.12%
|
12
|
9992
|
20221208
|
narlal_60_BC
|
BC
|
BC
|
NA
|
|
NARLAL_60_191113_BC
|
SNV
|
GPR139
|
p.Gly224Glu
|
Missense
|
|
0.10%
|
13
|
12497
|
20221208
|
narlal_60_BC
|
BC
|
BC
|
NA
|
|
NARLAL_60_191113_BC
|
SNV
|
KIT
|
p.Lys807Asn
|
Missense
|
|
1.74%
|
240
|
13769
|
20221208
|
narlal_60_BC
|
BC
|
BC
|
NA
|
|
NARLAL_60_191113_BC
|
SNV
|
MKRN3
|
p.Ala51Val;p.Ala51Val
|
Missense;Missense
|
|
0.26%
|
34
|
13031
|
20221208
|
narlal_60_BC
|
BC
|
BC
|
NA
|
|
NARLAL_60_191113_BC
|
SNV
|
NAV3
|
p.Gly589Cys;p.Gly589Cys
|
Missense;Missense
|
|
0.16%
|
18
|
11207
|
20221208
|
narlal_60_BC
|
BC
|
BC
|
NA
|
|
NARLAL_60_191113_BC
|
SNV
|
NFE2L2
|
p.Asp29Tyr
|
Missense
|
|
0.17%
|
13
|
7642
|
20221208
|
narlal_60_BC
|
BC
|
BC
|
NA
|
|
NARLAL_60_191113_BC
|
SNV
|
NYAP2
|
p.Ala307Thr
|
Missense
|
|
0.09%
|
12
|
13741
|
20221208
|
narlal_60_BC
|
BC
|
BC
|
NA
|
|
NARLAL_60_191113_BC
|
SNV
|
PDZRN3
|
p.Leu650Gln;p.Leu367Gln
|
Missense;Missense
|
|
0.65%
|
92
|
14148
|
20221208
|
narlal_60_BC
|
BC
|
BC
|
NA
|
|
NARLAL_60_191113_BC
|
SNV
|
PIK3CG
|
p.His295Gln
|
Missense
|
|
1.19%
|
136
|
11420
|
20221208
|
narlal_60_BC
|
BC
|
BC
|
NA
|
|
NARLAL_60_191113_BC
|
SNV
|
RET
|
p.Val706Met;p.Val706Met
|
Missense;Missense
|
|
0.07%
|
18
|
26821
|
20221208
|
narlal_60_BC
|
BC
|
BC
|
NA
|
|
NARLAL_60_191113_BC
|
SNV
|
USP29
|
p.Asp790His
|
Missense
|
|
0.79%
|
80
|
10070
|
20221208
|
narlal_60_BC
|
BC
|
BC
|
NA
|
|
NARLAL_60_191113_BC
|
SNV
|
WIPF1
|
p.Val152Gly;p.Val152Gly;p.Val152Gly
|
Missense;Missense;Missense
|
|
0.11%
|
19
|
16588
|
20221208
|
narlal_60_BC
|
BC
|
BC
|
NA
|
|
NARLAL_60_200205
|
SNV
|
CDH18
|
p.Arg134His
|
Missense
|
BC_mut
|
0.20%
|
10
|
5100
|
20221125
|
60B
|
Tx
|
cfDNA
|
NA
|
|
NARLAL_60_200205
|
SNV
|
NYAP2
|
p.Ala307Thr
|
Missense
|
BAM, BC_mut
|
0.02%
|
1
|
4932
|
20221125
|
60B
|
Tx
|
cfDNA
|
NA
|
Small functions
(Not mandatory read)
As explained above, I have created some functions which can be used
to explore the datasets a bit more. These functions are not mandatory to
use but can be useful in order to get an overview of the projects,
samples, runs, etc.
included_analyses()
This function is used to get an overview of the runs from AVENIO_runs.xlsx that are included in the
results dataset.
The function takes one mandatory argument:
master_list which is the AVENIO_results_patients.rds
loaded in using readRDS()
Output
The output is a named list of lengths = 2. The first
entry (“Overview:”) is a data.frame with three
variables:
- Analysis.ID - The run ID from the AVENIO_runs.xlsx
file
- n - Number of samples from that run that are
included in the results dataset
- Analysis.Name - The name of the run in the Avenio
system
The second entry (“Details:”) is a named list of
data.frames where each data.frame is named
with the run ID. Each data.frame contains four
variables:
- sample_index - Unique for each sample and is
explained above
- Sample.ID - Name of the sample in the Avenio
system
- Analysis.Name - The name of the run in the Avenio
system
- Analysis.ID - The run ID from the AVENIO_runs.xlsx
file
Available entries can be shown using the following:
results <- readRDS("//Synology_m1/Synology_folder/AVENIO/AVENIO_results_patients.rds")
analyses <- included_analyses(results)
names(analyses[["Details:"]])
OR
results <- readRDS("//Synology_m1/Synology_folder/AVENIO/AVENIO_results_patients.rds")
analyses <- included_analyses(results)
analyses[["Overview:"]]$Analysis.ID
Example
An example of how the function is used is shown below:
# Reading the results file
results <- readRDS("//Synology_m1/Synology_folder/AVENIO/AVENIO_results_patients.rds")
# Extract included analyses
analyses <- included_analyses(results)
# Getting an overview of the analyses
analyses[["Overview:"]]
Which should result in something like this:
|
Analysis.ID
|
n
|
Analysis.Name
|
Analysis.Type
|
|
AE5hBlinjEZN6ZZ1L7898Rzc
|
1
|
20210204
|
Plasma
|
|
AEdiX1AaYJBFM4G7307KEA2u
|
16
|
FIOL6
|
Plasma
|
|
AEepGWqCaABI04g5DQESU97x
|
14
|
Copy_FaXb_20230112
|
Plasma
|
|
AEHqc78-RkhKwqARpiG9V9Qi
|
13
|
20211111
|
Plasma
|
|
AEjxrRrIVjJGzbUFckTwpTpp
|
12
|
Copy_hjRD_202008212
|
Plasma
|
|
AeUIQZkyte1MjIyV0h0MS0aI
|
9
|
20211201
|
Plasma
|
And the second entry is viewed using:
# Reading the results file
results <- readRDS("//Synology_m1/Synology_folder/AVENIO/AVENIO_results_patients.rds")
# Extract included analyses
analyses <- included_analyses(results)
# Exploring specific run
analyses[["Details:"]][["AGU232uEYZZKe4oI95IrJu7o"]]
Which should result in something like this:
|
sample_index
|
Sample.ID
|
Analysis.Name
|
Analysis.ID
|
Analysis.Type
|
|
MonAlec_00QWTDYUH1E0VRM8_200217
|
4
|
Copy_rceq_Monalec21042020
|
AGU232uEYZZKe4oI95IrJu7o
|
Plasma
|
|
MonAlec_SEAVIJ1HQS7PSXM1_200217
|
3
|
Copy_rceq_Monalec21042020
|
AGU232uEYZZKe4oI95IrJu7o
|
Plasma
|
|
MonAlec_9XPREVQOLXK3JE4S_200304
|
1
|
Copy_rceq_Monalec21042020
|
AGU232uEYZZKe4oI95IrJu7o
|
Plasma
|
|
MonAlec_9XPREVQOLXK3JE4S_200401
|
2
|
Copy_rceq_Monalec21042020
|
AGU232uEYZZKe4oI95IrJu7o
|
Plasma
|
|
Batezo_pt08_200316
|
9
|
Copy_rceq_Monalec21042020
|
AGU232uEYZZKe4oI95IrJu7o
|
Plasma
|
|
Batezo_pt11_200326
|
12
|
Copy_rceq_Monalec21042020
|
AGU232uEYZZKe4oI95IrJu7o
|
Plasma
|
|
Batezo_pt11_200414
|
13
|
Copy_rceq_Monalec21042020
|
AGU232uEYZZKe4oI95IrJu7o
|
Plasma
|
|
Batezo_pt06_200211
|
7
|
Copy_rceq_Monalec21042020
|
AGU232uEYZZKe4oI95IrJu7o
|
Plasma
|
|
Batezo_pt09_200320
|
14
|
Copy_rceq_Monalec21042020
|
AGU232uEYZZKe4oI95IrJu7o
|
Plasma
|
|
Batezo_pt09_200416
|
15
|
Copy_rceq_Monalec21042020
|
AGU232uEYZZKe4oI95IrJu7o
|
Plasma
|
|
MonAlec_YBNHFHWVB0HBC3YY_200211
|
5
|
Copy_rceq_Monalec21042020
|
AGU232uEYZZKe4oI95IrJu7o
|
Plasma
|
|
Batezo_pt05_200204
|
6
|
Copy_rceq_Monalec21042020
|
AGU232uEYZZKe4oI95IrJu7o
|
Plasma
|
|
Batezo_pt07_200306
|
8
|
Copy_rceq_Monalec21042020
|
AGU232uEYZZKe4oI95IrJu7o
|
Plasma
|
|
Batezo_pt10_200325
|
10
|
Copy_rceq_Monalec21042020
|
AGU232uEYZZKe4oI95IrJu7o
|
Plasma
|
|
Batezo_pt10_200415
|
11
|
Copy_rceq_Monalec21042020
|
AGU232uEYZZKe4oI95IrJu7o
|
Plasma
|
|
Madsen_A10_200407
|
16
|
Copy_rceq_Monalec21042020
|
AGU232uEYZZKe4oI95IrJu7o
|
Plasma
|
explore_AVENIO_runs()
This function is used to get an overview of the AVENIO_runs.xlsx file and explore the sample
entries that have been entered. This can be used to get an overview of
which samples are missing information, which analyses have not been
added to the results dataset, etc.
The function takes NO mandatory arguments, but takes
three optional arguments:
Info - Name of the information of interest, if
Info = NULL (default) all information is displayed
silent - A Boolean determining if messages
should be displayed. If silent = FALSE (default) messages
are displayed
synology_path which is the path to the Avenio_runs.xlsx
and AVENIO_results_patients.rds files. (Default:
“//Synology_m1/Synology_folder/AVENIO/”)
Output
A list with different information on the
AVENIO_runs.xslx (Info = NULL) or the specific object as
determined by Info.
Example
Two examples of how the function is used is shown below:
# Exploring the AVENIO_runs.xlsx file
results <- explore_AVENIO_runs(silent = TRUE)
# Getting the information of total entries
results$Total_entries
Should return:
#> [1] 1932
And:
# Getting the information on the samples with all the required information
explore_AVENIO_runs(Info = "Required",
silent = TRUE)
Should return something like:
|
CPR
|
Name_in_project
|
Project
|
Sample_date
|
Run_name
|
Run_ID
|
Sample_name
|
Material
|
Sample_note
|
|
2405460036
|
pt01
|
Batezo
|
2019-10-25
|
Copy_mrAB_Monalec_batezo_1
|
AajqaHzI9_hCH4wYCbDAIYx4
|
12
|
cfDNA
|
Unknown
|
|
2405460036
|
pt01
|
Batezo
|
2019-10-02
|
Copy_WQIe_Monalec_batezo_m_salic
|
ARkgLARFX_lCQL3jQm9qG7IA
|
12
|
cfDNA
|
Tx
|
|
2405460036
|
pt01
|
Batezo
|
2020-04-14
|
Copy_hjRD_202008212
|
AEjxrRrIVjJGzbUFckTwpTpp
|
B6
|
cfDNA
|
Tx
|
|
2405460036
|
pt01
|
Batezo
|
2019-10-25
|
Copy_supo_20230303
|
AGwQDzxcqrdDlpzlcS10s1En
|
pippin_febr_Louise_1
|
size_selection
|
Unknown
|
|
2405460036
|
pt01
|
Batezo
|
2019-10-25
|
20231004
|
AU-GJVH_R-RAXa0UUil1MCik
|
pt01bc
|
BC
|
BC
|
|
1102610063
|
pt02
|
Batezo
|
2019-11-04
|
Monalec_Batezo_2
|
AYO0H3fNbPBEOKwJA9dv9IH0
|
6
|
cfDNA
|
Unknown
|
Output
description
The output of the function is a named list with
different entries. I have included a function
(explore_AVENIO_runs_Info()) which displays a brief
explanation of each entry. The function has no mandatory arguments. To
view the explanations run the following command:
# Getting explanations for entries in explore_AVENIO_runs()
explore_AVENIO_runs_Info()
Should return something like:
|
Name
|
Description
|
|
Total_entries
|
Total number of samples registered in Avenio_runs.xlsx
|
|
Complete_entries
|
Number of samples with complete info and able to be included in
AVENIO_results_patients.rds
|
|
Required
|
Samples containing all the required information
|
|
Material_stats
|
Number of samples with each designated type of material
|
|
Time_stats
|
Number of samples with each designated sample type
|
|
Unincluded_analyses
|
Runs entered in Avenio_runs.xlsx but is not present in
AVENIO_results_patients.rds
|
|
Unincluded_CPRs
|
CPRs entered in Avenio_runs.xlsx but is not present in
AVENIO_results_patients.rds
|
|
Incomplete_IDs
|
Incomplete Run IDs
|
|
Incomplete_dates
|
Samples where dates are missing or wrongly formatted
|
|
Incomplete_names
|
Samples where the sample name, project name or name in project is
missing
|
|
Incomplete_material
|
Samples where the material information is missing
|
result_stats()
This function is similar to explore_AVENIO_runs() but is
used to get an overview of the results dataset. The function extracts
information from the Avenio_results_patients.rds file and
generates some basic statistics on the dataset.
The function takes NO mandatory arguments, but takes
three optional arguments:
Info - Name of the information of interest, if
Info = NULL (default) all information is displayed
silent - A Boolean determining if messages
should be displayed. If silent = FALSE (default) messages
are displayed
synology_path which is the path to the Avenio_runs.xlsx
and AVENIO_results_patients.rds files. (Default:
“//Synology_m1/Synology_folder/AVENIO/”)
Output
A list with different stats on
AVENIO_results_patients.rds (Info = NULL) or the specific
object as determined by Info.
Example
Two examples of how the function is used is shown below:
# Exploring the results file
results <- result_stats(silent = TRUE)
# Getting the information on all mutations
results$All_mutations
Should return:
|
Gene
|
n
|
|
EGFR
|
829
|
|
TP53
|
756
|
|
APC
|
393
|
|
KIT
|
359
|
|
PIK3CG
|
320
|
|
USP29
|
312
|
|
MET
|
303
|
|
PDZRN3
|
297
|
|
BRCA2
|
261
|
|
KRAS
|
231
|
And:
# Getting basic statistics on the NGS results
result_stats(silent = TRUE,
Info = "Basestats")
Should return something like:
|
stat
|
n
|
|
Patients
|
744
|
|
Samples
|
1916
|
|
Runs
|
145
|
|
Mutations
|
7745
|
|
cfDNA
|
1491
|
|
BC
|
299
|
|
tissue
|
59
|
|
size_selection
|
35
|
|
cfChIP
|
17
|
|
reanalyze
|
15
|
|
Tx
|
922
|
|
BL
|
482
|
|
Unknown
|
142
|
|
tumor_BL
|
50
|
|
tumor_Tx
|
7
|
|
Post-Tx
|
6
|
Output
description
The output of the function is a named list with
different entries. I have included a function
(result_stats_Info()) which displays a brief explanation of
each entry. The function has no mandatory arguments. To view the
explanations run the following command:
# Getting explanations for entries in result_stats()
result_stats_Info()
Should return something like:
|
Name
|
Description
|
|
Basestats
|
Number of samples, runs, patients, different materials etc.
|
|
Projectstats
|
Number of patients and samples in each project
|
|
Missing
|
Samples present in AVENIO_runs.xlsx but not present in the results data
set
|
|
All_mutations
|
Number of times each gene is mutated across all samples
|
|
Relevant_SNV
|
Number of SNVs detected not classified as BC or synonymous mutations
|
|
Relevant_INDEL
|
Number of INDELs detected not classified as BC or synonymous mutations
|
|
BC_in_plasma
|
Number of times each gene is mutated in plasma but classified as BC
mutation
|
|
Fusions_project
|
Number of patients for each project with detectable EML4-ALK with
DNAfusion
|
|
Fusions_sample
|
Number of samples for each project with detectable EML4-ALK with
DNAfusion
|
|
Fusions_variant
|
Number of patients with each EML4-ALK variant
|
|
Fusions_NC
|
Information on the non classified EML4-ALK variants
|
|
Lengths
|
Distribution of fragmentlengths across different materials
|
|
Depths
|
Distribution of unique depths across different materials
|
|
Reads
|
Distribution of mapped reads across different materials
|
|
On_target
|
Distribution of on target percents across different materials
|
unlist_frames()
This function takes the list of data.frames containing all mutation
information and creates a single data.frame with the results allowing
downstream dplyr manipulation on all results. See the
figure to see how a list of data.frames are merged into a single
data.frame

The function takes one mandatory arguments:
df_list which is the AVENIO_results_patients.rds loaded
in using readRDS()
Output
A data.frame where each row is a mutated gene and all
the variables from the AVENIO .csv files. One additional
variable is added CPR which is extracted from the names in
the input df_list.
Example
An example of how the function is used is shown below: Again, the
CPR number is not a real CPR number. The output is quite large
(many rows) so I have only included 10 rows and selected specific
variables of interest
master <- readRDS("//Synology_m1/Synology_folder/AVENIO/AVENIO_results_patients.rds")
master_df <- unlist_frames(master_list = master)
print(master_df)
Which gives the following output:
|
CPR
|
sample_index
|
Analysis.Name
|
Analysis.ID
|
Mutation.Class
|
Gene
|
|
1909208904
|
IDA-MRD_A_180607
|
IDAMRD_2.0
|
AZR_EFZEsjxHLKcIz2BtCTIF
|
NA
|
NA
|
|
2709397795
|
PETERS_LBD_231114
|
Copy_GlAY_20231207
|
Aa4vMChZHTdFe76BAPB2R8Pl
|
INDEL
|
EGFR
|
|
2709397795
|
PETERS_LBD_231114
|
Copy_GlAY_20231207
|
Aa4vMChZHTdFe76BAPB2R8Pl
|
SNV
|
TP53
|
|
2709397795
|
PETERS_LBD_240801
|
20241030
|
AKmgHQ2-w3RHTqVjtWTKLz60
|
INDEL
|
EGFR
|
|
2709397795
|
PETERS_LBD_240801
|
20241030
|
AKmgHQ2-w3RHTqVjtWTKLz60
|
CNV
|
EGFR
|
|
2709397795
|
PETERS_LBD_240801
|
20241030
|
AKmgHQ2-w3RHTqVjtWTKLz60
|
CNV
|
MET
|
|
3007652609
|
super_AUH2_200917
|
20201007 MonBazo
|
ARPd_ATt_oFBNrmQ7Yk_8YCo
|
SNV
|
NPAP1
|
|
1802857106
|
NARLAL_47_170411
|
20220922
|
ANEzevwHG-lNYLnNIR6aTGu9
|
SNV
|
FBXL7
|
|
1802857106
|
NARLAL_47_170411
|
20220922
|
ANEzevwHG-lNYLnNIR6aTGu9
|
SNV
|
RET
|
|
1509995826
|
Batezo_pt15_210204
|
20210312
|
AHdh6Q0hlCRO263hSjLFJNQ7
|
SNV
|
GRM8
|
DNAfusion
implementation (Not mandatory read)
As described above DNAfusion is used to analyze all
patients for EML4-ALK gene fusions. When a run is added, the BAM files
from that run are investigated with DNAfusion before the
samples are combined with other sequencing runs from the same
patients.
The following two functions from DNAfusion are used
res <- DNAfusion::EML4_ALK_detection(file = "FILE.bam")
var_res <- DNAfusion::find_variants(file = "FILE.bam")
Based on the output of these two functions a new mutation is
classified for that sample and added as a results row for that patient.
The relevant variables that are based on the DNAfusion
results include:
Flags - Set as “DNAfusion” to illustrate it is detected
by DNAfusion
Mutation.Class - Set to “FUSION” similar to Avenio
format
Gene - Set to “ALK;EML4” similar to Avenio format
Variant.Description - Classifies the EML4-ALK fusion
variant if possible
Allelle.Fraction - The fraction of softclipped reads at
EML4 divided by the coverage at the ALK breakpoint
Genomic.Position - The breakpoint positions in ALK and
EML4
Variant.Depth - The number of softclipped reads at EML4
breakpoint
Unique.Depth - The unique number of reads at the ALK
breakpoint
Exon.Number - Illustrates which intron the breakpoint
occurs in for ALK and EML4
These results are also maintanied when exploring the results with
create_simple_output()
Upon exploration it was identified that the variant:
Genomic.Position = "chr2:29223530;chr2:42295516" was
detected in several samples however, it could not be classified. If this
variant is detected, the fusion is classified as:
Variant.Description = "Uncertain_variant" and should be
removed from downstream analyses.
list_rebuild.R AND
AVENIO_runs.xlsx recovery (Not mandatory read)
Only if absolutely necessary on the github page
there is a file called list_rebuild.R. This file is used to
rebuild the AVENIO_results_patients.rds in the
case the file is lost or corrupted. Or in case an error
to the data set has been found and all samples need to be
reanalyzed.
The file only works if the AVENIO_runs.xlsx file is not lost.
list_rebuild.R has to be updated manually alongside
AVENIO_runs.xlsx, but Christoffer will make sure this happens relatively
regularly.
When the entire script has been run you should be able to investigate
which runs (if any) that have not been added to the dataset but are
present in the AVENIO_runs.xlsx file using:
explore_AVENIO_runs(silent = TRUE,
Info = "Unincluded_analyses")
And you can then use add_run_to_list() to add the
missing runs to the dataset
In case the AVENIO_runs.xslx file is lost, a backup
of this file can be found on genomeDK at
/faststorage/project/alabs_projects/Avenio_BAM/. This file is being
updated every month by Christoffer to ensure it is relatively up to date
at all times.
Session info
(Not mandatory read)
It is good practice to include this information in tutorials and
manuals to ensure reproducibility and for troubleshooting if something
does not work. If your code works, discount this section.
#> Warning in system2("quarto", "-V", stdout = TRUE, env = paste0("TMPDIR=", :
#> kørende kommando '"quarto"
#> TMPDIR=C:/Users/chris/AppData/Local/Temp/RtmpY5hBbU/file62544aaa5e7e -V' havde
#> status 1
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.5.0 (2025-04-11 ucrt)
#> os Windows 11 x64 (build 26200)
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate Danish_Denmark.utf8
#> ctype Danish_Denmark.utf8
#> tz Europe/Copenhagen
#> date 2026-01-05
#> pandoc 3.1.11 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#> quarto NA @ C:\\PROGRA~1\\RStudio\\RESOUR~1\\app\\bin\\quarto\\bin\\quarto.exe
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> AvenioUpdate * 1.13.0 2025-10-20 [1] Github (CTrierMaansson/AvenioUpdate@a57aa01)
#> crayon * 1.5.3 2024-06-20 [1] CRAN (R 4.5.0)
#> dplyr * 1.1.4 2023-11-17 [1] CRAN (R 4.5.0)
#>
#> [1] C:/Users/chris/AppData/Local/R/win-library/4.5
#> [2] C:/Program Files/R/R-4.5.0/library
#> * ── Packages attached to the search path.
#>
#> ──────────────────────────────────────────────────────────────────────────────